3… 2… 1… Backups

The rule for best practice backups is that you need to have three copies of anything important, stored in at least 2 different media, with at least 1 seperate location. This way if a site gets destroyed (Say Google Suddenly Cancels Google Drive, they’ve canceled weirder), you still have whatever is at your other location. If one format becomes difficult to read (Say CD players become untennable to get ahold of as obsolete tech), you still have the alternate format for your data. Overall this strategy makes you safe to most data loss events. If you keep things organized with your backups you also have the ability to always find things as you go through this.

RSync and RClone

Using Rsync was the first serious backup effort that I actually figured out to some extent. With the ability to hookup to everything it’s a great way to move data between different cloud services or even from the cloud onto a local system. Hooked up to a cron job it really is an ideal setup. My favorite use for this service is however to move from a bad system to a good system. If you currently have spotty backups to various locations and are making the swap over to a properly backed up 3, 2, 1 setup; grabbing all that data is going to be a very major task.

Ceph

So keeping a reliable drive system my goto used to be that drives should be in a safe raid configuration, or tied together with some form of ZFS. This is no longer my goto. Ceph manages all storage directly at the device level and keeps things properly copied, in such a way that if you have enough drives, you can lose drives constantly, without having to manually rebuild. Just need enough drives to keep ahead of losses. You can even have your multiple locations and multiple servers build in to your setup through sufficient configuartion. If a server goes down and a new one is added, then the data replication will simply be rebuilt on the new system.

I found this great calculator online for setting up your Ceph cluster: link

One of the best things about Ceph Storage Clusters is the fact that they can happily handle various sized drives without needing to downsize the larger drives to match the smaller, or creating any additional overhead. The system can handle making data highly available, so that servers can go down intermittently without destroying the cluster access as a whole. It’s honestly such a powerful system as to make it surprising that it isn’t more popular among individuals. Another very nice feature that is underappreciated is the fact that it can scale as more storage is added, without any drawbacks.

An example use case for an individual would be setting up their network storage setup. Say you start with 3 1TB drives, and are willing to be a bit risky with only 1 replica of any data. You’ll have a safe cluster size of 2TB, if you want 2 Replicas you need to drop down to 1TB max storage. This works for awhile, and then you buy a 10TB drive that you append, the system will rebalance the data for best safety, and you get to choose to potentially add additional replicas if you feel it’s necessary. You can then have a safe size of either 3TB (You’re limited by the size of your smaller nodes), or a risky cluster size upto 13TB. If you add an additional replica, you drop the sizes by half. If you throw an additional 10TB drive on there you can bring a 1 Repilica safe cluster size upto 13TB.

If a drive fails, you can simply replace it without bringing down the cluster. So long as not too much of the cluster fails at any point, you’re safe from data loss or downtime. With Ceph, your biggest problem becomes finding places to plug sufficient drives, most techie people will have a bank of lower space drives, that they would love to be able to add to their clusters, having sufficient power and sata ports to support them all is the actual challenge with this setup.

Home Ceph - 1 Node More to Come

I setup a single node ceph cluster using Hyper V and 1 drive on my system. The idea is that ideally I’d like to expand out to many nodes, and many drives; however, for the time being setting up the access and a single OSD is sufficient to learn a little bit.

I installed Ubuntu 22.04 Server on a Hyper-V VM and followed Ubuntu’s instructions for adding a drive and creating a single node with a single pool.
I created a user and grabbed their keyring information, transferring that over to a Windows machine where I’d already installed the Ceph for Windows utilities.
Inside I followed the minimum configuration file for creating ceph connection
I had originally forgotten to install Dokan which allows for creating virtual file systems within windows

My intention with the setup is to add additional ceph nodes as possible to try and increase my available storage and reliablity Starting with my ~200GB of unsafe storage I’ll add replication and additional safety as I increase my cluster size.

Cloud Storage

Cloud storage seems cheap, but they’ll torture you if you actually use what you pay for. I’m rather fed up with google drives storage settings, and I need to go and see if I can disable the warnings. I’ve got an account with a grand total of 4TB of storage. It’s thanks to backups, 85% full. This drives google off the wall as they sell that storage clearly assuming you won’t use it all. Comparing it to their bucket storage costs, google drive is a steal of a deal, that is so long as you assume that google is keeping all your data full accessible and properly redundantly backed up (Hint, google and you’ll find instances of google losing peoples files, at least temporarily…).

Cloud Storage acts as a valid backup for local storage; however, for the average data hoarder it can get quite expensive. Especially if it’s going to be a method you’re actually acessing on a regular basis. At time of writing:

Google Cloud Bucket Storage, $0.020/GB Standard Storage - $0.0012/GB Archival Storage. If we up the storage quantity to the TB range, $1.20/TB of archival storage, isn’t a lot. But some will still object to having to pay monthly for storage, plus costs for network operations to storage and retrieve the data. I estimate these to cost roughly $50/TB retrieved.
AWS Glacier Pricing (Amazons Archival pricing) $0.0036 per GB / Month. This is a surprising amount more expensive than Google’s archival storage; however, to the credit of amazon they are a lot more clear about the costs of the data’s retreival at ~$90/TB. Which is again higher. I guess google drive is just cheaper at storage than AWS.
IBM Storage, $0.0230/GB for their standard storage, and $0.0013/GB Archival Storage. IBM’s Archival storage has the caviat that getting the data may take upto 12 hours; however, given a cost on-par with google at $1.30/TB. I’d say this is an acceptable compromise for cheap archival backups. The retrieval cost is $0.0209/GB which assuming this includes actual network egress is so much cheaper than the alternatives that it could very well be an exceptional option for anyone interested in longer term backups.

There are other storage solutions in the cloud worth consideirng; however, a lot of them can be confusing because the pricing structure is so heavily dependent on the level of activity of the storage, as well as the physical distance. Another thing to bear in mind is even simply the minimum storage time. A lot of the archival costs are based on a minimum time storaged of 1 year. $1.20/TB/month is $13.40/TB/year. Again not a lot of money, but it can certainly creep up. Using the cloud as an emergency rebuild option is not a bad idea, the high costs for retrieval coupled with the relatively low costs for the monthly storage bill is fairly reasonable. Especially remembering that you can compress everything that you’re putting into your deeper archival storage.

Hex OS

SO LTT sponsored the development of an operating system specifically for making yourself clean and easy NAS systems. I gotta say it looks flipping cool; however, I’ve also got to say. Holy cow it’s overpriced in my opinion. Released at $99 per server on black friday it was a reasonable prospect. A little more than I would be willing to spend on a lark, but I could see a lot of people doing what they do in the video, buying a trash system, loading some drives, and calling it their NAS. Not a bad idea and while as someone who aims to turn these things into learning experineces I wouldn’t be running to pay more than ~$20 / license I can see a reasonable slice of tech literate but not tech exceptional people doing this for their home setups, and even small businesses. However, at an early bird beta price of $200 a license. No… It’s not that it’s a bad product, it’s a bad price for the product.

The primary offering as I see it is trying to simplify redundant safe storage for pro level consumers. Okay I’ve listed the options I’m considering and looking into for my backups elsewhere on this page. They’re options I’ll consider but I wouldn’t expect someone without either time or a desire for pain to actually consider them. Lets compare to a My Cloud Pro Series PR2100 w/ 28TB of storage is only ~$1600. This is a lot more than the $200 a license; however, it includes storage. It’s all new, and it’s made by a company with a LONG proven track record. Buy 2 and it’ll even do offsite backups.
If the high price for new (Which you’d have to pay for new trustworthy hardware anyways) is too much for you. Go used of these established brands instead. QNAP TS-859 Pro Network for $850 bucks with 16TB of storage across 8 different drives on facebook marketplace right now. That’s the same quality hardware, but with QNAPs software instead of an unproven company. Again Hex OS is not a bad product, but I think it’s not a great value for money.

Okay so if that is all not enough to convince that this is not an approach I’ll take at this price. Look at the cost of a brand new Windows License. Direct from Windows you’re looking at $200 for a pro windows license. Windows can be setup as a file server. Windows also has a substantially larger development overhead as it’s a full system. What is better is your old hardware likely already had Windows coming with it. What’s more is conveniance is the HexOS selling feature, Windows will play nicely with other windows machines. Especially on a private network that’s been setup properly. I’m sorry but all of this considered Hex OS should be $100 at its full price, not at its heavily discounted please test our new software price.